Mongod
Table Of Contents
Mongod: Mongo daemon. Daemon is a program that is running, but we can not interact with it directly. Daemon usually has d appended to their name. Examples of daemons are like Redis server, docker daemon etc.
Mongod is the core server of database, handles connections, requests and persists data. To interact with daemon, we need specified client apps like CLI or Code libraries. Our commands are issued to client and client sends it to daemon to do actual process.
Each server part of replica set or shard will have their own mongod process running. In a multiple server deployment, we can configure our client to communicate with each of the mongod process as needed.
To start daemon, we need to run mongod
command.
Configuration
We can provide our configuration file or configuration options to mongod. If we run the command as it is, mongod will apply some default configurations.
Default Configurations
- Port: 27017
- dbpath: /data/db (This is where the database, collections, documents and journals are stored)
- bind_ip: localhost (Only servers running locally in the machine are allowed to connect to db)
- auth: disabled
Configuration Values
We can either provide the configuration values via CLI options or writing entries in YML file and provide it as a configuration. Here we have listed CLI options & yml entries together. The format of most used configuration values and their description goes as below.
cliOptions
orconfigurationFileSettingInYml
: Description of the option.
Here are the values being used.
dbpath
orstorage.dbPath
: The dbpath is the directory where all the data files for your database are stored. The dbpath also contains journaling logs to provide durability in case of a crash- If we want to point to new directory, mongod must have read and write permissions to that new directory since mongod will write data and journal to that path.
port
ornet.port
: The port option allows us to specify the port on which mongod will listen for client connections.auth
orsecurity.authorization
: This flag enables authentication option (disabled by default). Regardless of values of this option, mongo shell in localhost can connect to mongod. Once the shell is connected, users and their access level can be configured from that shell instance.- To run mongod with auth option, run
mongod --auth
command.
- To run mongod with auth option, run
bind_ip
ornet.bindIp
: This configuration allows for clients on said IP to connect to mongod. E.g. If I pass 111.222.333.444 as bind_ip, clients running on 111.222.333.444 can connect with mongod.- Multiple bind_ip s can be specified by separating them with comma.
fork
orprocessManagement.fork
: Tells mongod to run as a real daemon instead of running it in terminal. To keep looking logs we need totail
the logs.- This option is not available in windows.
--config
: Tells mongod (or mongosh) to read configuration from a YAML file-f
: Same is--config
.
Location of config file
When mongod is installed via package manager, it creates a configuration file automatically. Here are the locations of the file per operating system
Os | path |
---|---|
macOS (Intel Processor) | /usr/local/etc/mongod.conf |
macOS (Apple M1 Processor) | /opt/homebrew/etc/mongod.conf |
Linux (apt, yum, or zypper Package Manager) | /etc/mongod.conf |
Windows (MSI Installer) | <install directory>\bin\mongod.cfg |
Options and Mapping
File Structure
Like many databases, mongod stores the data and other access mechanism in various files. None of them are designed to be accessed by humans, their content only makes sense if they are read by mongod or mongo clients.
There are some .lock
files (like WiredTiger.lock
or mongod.lock
) in the file structure. If these files are present and have some data in it, that indicates either a separate process working with MongoDB or there was an unclean shutdown. In both of the cases user may need to remove those lock files.
The .wt
files, are related to data. The files start with collection
are related to collection data and files start with index
are data related to indices. When creating a fresh set of database, we may find some default collections and indices created already.
The diagnostic.data
directory contains diagnostic information about mongo server. That diagnostic data does not contain any actual private data. These data is captured by FTDC (Full Time Data Capture) that includes information about server configuration, some statistics about server and the command line options used. To be able to use these files for any diagnostic, users have to provide them explicitly to MongoDB support engineers.
Journaling
MongoDB write operations are by default buffered in memory and flushed at every 60 seconds, creating a checkpoint. Journaling systems also use write ahead journaling system. Journal entries are buffered in memory and then syncs the journal to disk every 100ms. Each journal file is limited to 100mb in size. WireTiger also flushes when all journal file has 2gb data. Flushing again creates another checkpoint.
When server unexpectedly crashes, there are some write transactions written in journal files. When mongod comes back online, WireTiger looks at existing data files and finds the identifier of last checkpoint and then searches journal file for the record that matches the checkpoint identifier and applies operations from journal files since the last checkpoint.
Basic Commands
There are some helper commands.
Shell helper groups
Methods available in MongoDB shell that wrap underlying database command.
db
shell helper: Interacting with database.rs
shell helper: Managingr
eplicas
ets.sh
shell helper: Managingsh
arded cluster management and deployment.
Some useful shell helpers
- User Management.
db.createUser()
db.dropUser()
- Collection Management
db.renameCollection()
db.{collectionName}.createIndex()
db.{collectionName}.drop()
- Database Management
db.dropDatabase()
db.createCollection()
- Database Monitoring
db.serverStatus()
Logging
Two types of Logging. Process Log and …
Process Log
Process logs collects activity in one of the following components.
ACCESS
(accessControl
): Messages related to Access control.COMMAND
(command
): Messages related to DB CommandsCONTROL
(control
): Messages related to Control activities like initializationFTDC
(ftdc
): Messages related to Diagnostic Data Collection mechanism.GEO
(geo
): Messages related to parsing of geospatial shapes (Lat long and places)INDEX
(index
): Messages related to indexing operationsNETWORK
(network
): Messages related to networking activities such as accepting connectionsQUERY
(query
): Messages related to queries including query planner activitiesREPL
(replication
namespace): Messages related to replica sets such as initial sync & heartbeatsSHARDING
(sharding
namespace): Messages related to sharding.WRITE
(write
): Messages related to write operations
These messages have verbosity value attached to it. -1
means inherit from parent. Default value 0
means informal level. Verbosity can go from 0-5. Higher the verbosity level, more messages will be printed related to that component.
Profiler
Profiler is used to measure db performances. Most of the time logs don’t reveal whole story. Profilers are enabled at database level so profiling can be done separately.
When enabled, profiler restores data for all operations of given db and creates a new collection called system.profile
. This operation will do profiling and hold data on CRUD operations, Admin Operations and Config Options.
Three settings values.
Value | Meaning |
---|---|
0 (Default) |
Profiler is off and does not collect any data |
1 |
Profiler collects data for operation that takes longer than value of slowms |
2 |
Profiler collects data for all operations |
- We can get current profiling level with
db.getProfilingLevel()
. Which returns following object{ was: 0, slowms: 100, sampleRate: 1, ok: 1 }
.- Here
was
is previous (or current) profiling level slowms
indicates the maximum time query should take before it is considered slow.
- Here
- We can set profiler level with
db.setProfilingLevel(profileLevel, {ProfileParams})
.- We can set
slowms
using this method call.
- We can set
Authentication & Authorization
Security concepts:
Used for Authenticating user.
Challenge
: Something that proves the authentication of user, AKAwho is the user?
,how does the user prove his identity?
.Response
: The response from user regarding the challenge, AKAThe username and password
orfederated login
.Validation
: The logic from server who validates the response for the challenge, if the response is validated, the user is authenticated, if not server keeps providing different challenges.
Authorization is what are the privileges of user?.
Authentication answers and validates who the user is and Authorization answers what kind of access this user has. For example, by providing my GitHub credentials, GitHub know who I am, which is called authentication. And they have stored all data related to which repository I can access, which is called Authorization. These authorization data contains all open source repositories, repositories I have created, and private repositories I am allowed to see.
MongoDB authentication mechanisms
Available in all versions
SCRAM
(S
altedC
hallengeR
esponseA
uthenticationM
echanism): Default authentication mechanism.- Here MongoDB provides some challenge that user must respond to.
- Equivalent to Password Authentication
X.509
: Uses X.509 certificate for authentication.
Available in enterprise versions
LDAP
(L
ightweightD
irectoryA
ccessP
rotocall): Basis of Microsoft AD.KERBEROS
: Powerful authentication designed by MIT.
Intra-cluster Authentication
Two nodes in a cluster authenticate themselves to a Replica set. They can either use SCRAM-SHA-1
by generating Key file and sharing key file between the nodes, (the approach used in replication labs). Or they can authenticate using X.509
certificates which are issued by same authority and can be separated from each other.
Authorization
- RBAC is implemented
- Each user has one or more roles
- Each role has one or more privileges
- Each privilege represents a group of actions and the resources where those actions are applied
- E.g. There are three roles,
Admin
,Developer
andPerformance Inspector
Admin
has all the privileges.Developer
can create, update collection, indices and data. They can also delete indices and data. But can not change any performance tuning parameter.Performance Inspector
can view all the data and indices, can see and change performance tuning parameter.
Localhost Exception
Localhost Exception
, even with auth enabled MongoDB doesn’t provide any default users, we have to create one ourselves. So Localhost Exception
is created so that we can connect to mongosh
running in the same machine of mongod
, and create a user.
Localhost Exception doesn’t apply in following scenarios.
- Once a user is created
Localhost Exception
doesn’t apply, so it’s desirable to create first user as admin privileges. - If we define any authentication mechanism,
Localhost Exception
doesn’t apply when we connect tomongod
ormongos
using their replica set or shard URLs, localhost exception only applies if we connect to that specific node.
Role structure
- Specific DB and specific collection
{db:'databaseName',collection:'collectionName'}
- All Databases and all collections
{db:'',collection:''}
- All Collections in given DB
{db:'databaseName',collection:''}
- Collection name in any Database
{db:'',collection:'collectionName'}
- This means this privilege gives access to given collection name in all databases.
- Cluster Resources
{cluster: true}
If someone needs to be given access to delete and update accounts collection in any database the access this privilege looks like.
{resources: {db:'',collection:'collectionName'}, actions:["delete","update"]}
Role can inherit from other roles.
We can allow defining network restrictions (Source IP or Destination Address) in Role definitions.
Built in Roles
MongoDB provides built-in roles. Organized in 5 groups.
- Database User
read
readAnyDatabase
*readWrite
readWriteAnyDatabase
*
- Database Administration
dbAdmin
dbAdminAnyDatabase
*userAdmin
userAdminAnyDatabase
*dbOwner
- Cluster Administration
clusterAdmin
clusterManager
clusterMonitor
hostManager
.
- Backup/Restore
backup
restore
- Super Admin
root
*
The roles are assigned to user on database level. A user can have different roles on different database without affecting access rights with each other.
The roles marked with *
are applicable to all databases.
We can also create some custom roles to fit our needs.
User Defined Roles
User defined roles can be created on a given database, and can only inherit roles created in same database. If we need to share user defined roles, global role, between the databases, such roles must be created in admin
database.
- To Create a role, call
db.createRole({role,privileges:[{resource:{db,collection},action}],roles:[{role,db}]})
command in the database where you want to create the role.- Here
role
stands for role name privileges
denote which privileges should be there in current role.roles
define which roles we are inheriting from.
- Here
Here are the list of available resources and actions to create or update user defined roles.
- To add new roles to users call
db.grantRolesToUser
method. Callingdb.updateUser
with new roles will override all existing roles. - To update the role, there are three ways.
- Call get role, update the
privilege
document and calldb.updateRole
command. This is more powerful because it will replaceprivilege
entirely. - Call
db.grantPrivilegesToRole
command, we can add new privileges (new actions and/or new resources).- Similarly,
db.revokePrivilegesFromRole
command revokes existing privileges from existing role.
- Similarly,
- Calling
db.grantRoleToRole
allows us to inherit new role(s) for given user defined roles.- Similarly,
db.revokeRolesFromRole
allows us to remove existing role inheritance.
- Similarly,
- Call get role, update the
To see what each role can do on given database. Run following command.
db.runCommand( { rolesInfo: { role: "ROLE_NAME", db: "DATABASE_NAME" }, showPrivileges: true} )
This command will return following interesting properties.
db
: In which database the role is being usedrole
: Name of all rolesroles
: (?)privileges
: Lists out all privileges that this role defines.inheritedRoles
: All roles which are inherited by given role.inheritedPrivileges
: privileges coming from inherited roles.isBuiltin
: Boolean flag denotes if this role is built in or not.
DB Owner role is a super admin for given database, that role has privileges of readWrite
, dbAdmin
and userAdmin
roles for that database.
Server tools
We have mongod
, which is daemon and mongosh
(or in older version mongo
) which is CLI Application that connects with mongo daemon. There are other tools to help us.
To get all available CLI tools in mac, run following command.
find $(dirname $(which mongod)) -name "mongo*"
This will first run which mongod
command, this will give directory location from where mongod
is running, and then parse this as dirname
which we can run in find command to get all CLI starting with mongo.
Here are some tools and their description.
mongostat
: Quick statics aboutmongod
ormongos
process.mongorestore
andmongodump
: Import and export dump files from MongoDB collection. These files are inbson
format. These exports also have metadata file that indicates collection and indexes.mongoexport
andmongoimport
: Import and export data from MongoDB collection. These files are injson
orcsv
format (This will be decided by--type
parameter). Defaults tostdout
to export orstdin
for import. To use files use--out
parameter withmongoexport
or--file
parameter withmongoimport
. These export doesn’t have metadata files. So MongoDB has to be told where to import the collection (defaults totest.{fileName}
collection)
bson
, use dump
and restore
(DB Specific terms), and for json
use export
and import
.(Data specific terms)